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Final  Report  for  Award  Number  DAMD  1 7-03-1 -648\ 

Funding  period  09/17/03  -  09/16/04 
PI:  O’Connell,  P 

Introduction 

The  presence  or  absence  of  systemic  disease  is  the  most  crucial  factor  determining 
survival  versus  mortality  in  women  with  breast  cancer.  Identifying  high  risk  women  and  ensuring 
they  receive  appropriate  systemic  treatment  reduces  risk  of  death  from  breast  cancer 
metastasis.  However,  other  than  local  cancer  spread,  tumor  size,  at  proliferation,  .very  few 
clinically  useful  prognostic  markers  exist,  especially  for  lymph  negative  patients. 

We  have  identified  a  candidate  gene,  designated  metastasis-associated  1  (MTA1),  as  a 
possible  prognostic  and  predictive  marker  both  by  genomic  analysis,  1HC  of  breast  tumors,  and 
gene  expression  analysis  (1,  4).  Our  aims  are  to  identify  MTAI-regulated  genes  to  further 
assess  genetic  mechanisms  promoting  breast  cancer  spread. 

Body  of  Research 

Inferring  MTAI’s  role  in  metastasis  is  complicated  by  a  rapidly  growing  MTA-gene  family, 
so  far  six  published  alternatively  spliced  variants  encoded  at  three  genomic  loci  (i.e.,  MTA1-14q; 
MTA2,  aka  MTAILI-llq;  and  MTA3  aka  MTA2-2q).  Nuclear  MTA1  protein  functions  as  a  steroid 
hormone  receptor  co-repressor  (5,  6,  ).  Multiple  alignments  of  the  MTA1,  -2,  and  -3  genes 
identified  an  14q-locus  specific  peptide  which,  when  linked  to  a  hapten  and  used  to  immunize 
rabbits,  yielded  an  MTAI-specific  polyclonal  antibody.  Other  key  reagents  for  this  study  were 
breast  cancer  cell  lines:  MCF7,  T47D,  MDA-MB-435S,  and  MDA-MB-231  and  an  antisense 
MTA1  morpholino-oligomer  (AS-MTA1)  designed  to  knock  down  of  MTA1  protein  translation. 
Equipped  with  these  reagents,  we  1)  studied  the  cellular  distribution  of  MTA1  isoforms;2)  carried 
out  AS-MTA1  transfections  and  monitored  turnover  of  endogenous  MTA1  protein  by  western  blot 
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analsyis.  ,  and  3)  analyzed  differential  gene  expression  in  AS-MTAI  versus  control-treated 
breast  tumor  cells  to  identify  MTA1 -regulated  genes. 

Several  observations  emerged  from  these  studies. 

1.  The  transfection  method  (EPEI,  GeneTools,  Inc)  used  salts  out  AS-MTA1  oligomers 
and  carrier  DNA  in  a  precipitate  that  upon  cell  phagocytosis,  oxidizes  and  chemically  bursts 
endosomes  to  release  the  AS-MTA1  oligomer.  This  technique  proved  reliable  in  metastatic 
MBA-MB-435S  and  MDA-MB-231  cells,  but  the  MCF7  and  T47D  cell  lines  proved  completely 
retractile  to  AS-MTAI  treatment,  as  their  limited  motility  may  reduce  opportunities  to  accumulate 
the  AS-MTA1/DNA/EPEI  precipitate.  We  focused  on  the  metastatic  MDA-MB-231  and  -435S  cell 
lines  to  identify  MTA1 -regulated  genes,  and,  as  an  alternative  to  As-MTAI  in  the  treatment- 
resistant  MCF7  and  T47D  cells,  plan  to  may  stable  transfectants  overexpressing  MTA1.  Figure  1 
shows  western  bolt  MTA1  protein  is  suppressed  upon  AS-MTAI  treatment. 


Figure  1.  Western  blot  analysis  of  antibody  specificity,  MTA1  isoforms 
and  their  cellular  localization.  Lanes  1-4:  The  MCF7  breast  cancer  cell  line 
and  its  MTA1  isoforms  (1-  total  cell  lysate,  2-  nuclear,  3-  cytoplasmic,  4- 
MCF7  overexpressing  pcDNA3.1-MTA1.  Lanes  5-8:  Western  blot  with  same 
samples  as  1-4,  but  with  the  anti-MTAI  antibody  preabsorbed  to  excess 
MTA1  peptide  immunogen  to  validate  specificity. 

2.  MTA1  undergoes  alternative  splicing  to  the  nuclear  full  length  MTA1  and  a  more 
recently  described  “short”  cytoplasmic  isoform,  MTAIs.  MTAIs  shares  the  full  length  protein’s  N- 
terminus,  but  replaces  the  C-terminal  SH2-nuclear  localization  domain  with  a  distinct  C-terminal 
estrogen  receptor-binding  (LRILL)  motif  (7). 
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Figure  2.  Western  Blots  of  MTA1  in 
control  (2C,  3C)  and  AS-MTA1  (2M,  3M) 
treated  MDA-MB-435S  (3C  and  3M),  and  - 
231  (2C  and  2M).  The  MTA1-AS  treatment 
suppresses  several  MTA1  antibody 
reactive  proteins  that  are  differentially 
expressed  between  the  two  breast  cancer 
cell  lines. 


3.  MBA-MB-231  and  -435S  express  differing  subsets  of  MTA1  isoforms,  MBA-MB-231 
expressed  immunoreactive  proteins  that  were  AS-MTA1  suppressed.  We  suspect  a  number  of 
the  other  MTAI-antibodylmmunoreactive  proteins  (best  seen  in  figure  1,  lanes  3-6)  result  from 
alternative  spliced  MTA1  variants  that  use  an  upstream  initiating  methionine  that  are  not 
targeted  by  the  AS-MTA1  oligomer  tested.  RNA  from  treated  and  untreated  cells  was  hybridized 
to  Affymetrix  HG-U133A  genechips.  These  microarrays  measure  expression  of  14,500 
annotated  genes  and  3,900  gene  variants. 
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Figure  3.  MTA1 -knock  down 
associated  differences  MDA- 
MB231  cells  (2M,  2C).  Also  shown 
is  expression  of  the  same  genes  in 
treated  435S  (3M,  3C)  cells.  Genes 
in  the  four  MTA1 -regulated  gene 
clusters  (A-D)  are  identified  (right). 
AS-MTA1  -associated  gene 
expression  differences  between 
treated  and  untreated  MDA-MB- 
231  cells  are  shown  for  illustration 
purposes,  as  >250  genes  showed 
>2-fold  expression  difference  in 
treated  -435S  cells 


The  DNA  Chip  Analyzer  vl.3  (8,  9)  program  was  used  to  normalize  the  array  data  and 
model  gene  expression.  Differential  gene  expression  (>  2-fold  changes,  p  <  0.05)  AS-MTA1- 
treated  MDA-MB-231  breast  cancer  cells  is  summarized  in  Figure  3,  which  also  shows  the 
behavior  of  the  same  genes  in  the  less  metastatic  MDA-MB — 435S  (3M,  3C)  breast  cancer  cell 
line.  In  MDA-MB-231.  AS-MTA1  treatment  altered  the  expression  just  22  genes  in  four  clusters 
(A-D).  Gene  expression  patterns  in  the  teated  MDA-MB-231  and  -435S  cells  (2C,  3C)  were 
distinct,  while  those  for  the  AS-MTA1  treated  (2M,  3M)  cells  converged.  The  non-overlapping 
MTA1  isoform  repetoires  in  the  two  cell  lines  affect  MTA1  regulated  gene  expression.  For 
example,  cluster  “A”  shows  cooridinate  regulation  of  the  TRIM5,  and  RKHD  ring  finger  genes.  In 
the  -231  line,  histone  H4,  an  MTA1  acetylation  target  (11),  is  downregulated  by  AS-MTA1 
treatment  (i.e.,  higher  expression  in  the  presence  of  MTA1),  but  MTA1  upregulated  in  AS-MTA1 
treated  MDA-MB-435S.  Differences  in  clusters  “B”  a nd“D”  clusters  were -231  specific,  while 
those  in  “C”  were  coordinately  regulated  in  both  cell  lines.  Consistant  with  likely  MTA1  functions, 
differentially  expressed  gene’s  functions  include  apoptosis  (BCL2-like,  GRB2-like  endothelianB), 
cell  motility/adhesion  (ARGIDA,  CDC42BPA,  CDC42BPB,  gelosin,  CLPTM1,  membralin,  ITGB4, 
MARK2,  PLEC1)  cell  proliferation  (ADAM17,  EGFR),  and  gene  transcription  (ILF3,  RKHD2, 
CDC42BPA,  CSG6  RNAPOL2).  Our  IHC  analysis  detects  a  link  between  MTA1  and  AC 
chemoresponse  (1),  and  microarray  analsyis  of  untreated  tumor  core  biospies  showed  MTA1, 
microtubule-  and  mitotic  spindle-associated  genes  are  differentially  expressed,  suggesting  a 
potential  roles  in  docetaxel  chemoresponse. 

Key  Research  Accomplishments 

1 .  IHC  studies  of  tumor  with  long  term  follow  up  showed  MTA1  is  a  prognostic  factor  for  breast 
cancer  recurrence  risk,  women  whose  primary  tumors  overexpressed  nuclear  MTA1  protein  had 
a  2.7-fold  increase  in  recurrence  risk. 


7 


2.  The  same  study  indicated  MTA1  overexpression  associated  with  adriamycin-cytoxan  and 
tamoxifen  chemosensitivity.  In  addition,  our  report  (Chang  et  al  2003)  report  found  3-fold  higher 
levels  of  MTA1  mRNA  in  docetaxel-sensitive  breast  tumors.  We  will  continue  to  study  the  roles 
of  MTA1  isoforms  in  breast  tumor  chemoresponse. 

3.  Western  blot  analyses  of  breast  cancer  cell  lines  identified  new  MTA1  isoforms,  and  the  AS- 
MTA1  gene  expression  studies  suggest  these  different  isoforms  differentially  modulate  MTA1- 
regulated  gene  expression.  We  plan  further  studies  on  how  MTA1  isoform  heterogeneity  affects 
its  performance  as  a  prognostic  and  predictive  factor 

4.  Work  was  supported  in  part  by  this  funding  produced  a  co-transgenic  MMTV-  MTA1  and 
MMTV-EGFP  (green  fluorescent  protein)  mouse  model.  These  animals  are  being  bread  to  single 
oncogene-derived  (MYC,  Ras,  and  neu)  transgenic  mouse  models  of  primary  breast  tumors.  We 
plan  to  investigate  these  animal  for  de  novo  MTA1 -induced  metastases  (tg-Ras,  Myc  tumors  do 
not  metastasize),  and  MTA1 -enhanced  metastasis  (tg-Neu  tumors  produce  spontaneous  mets  at 
low  frequency). 

5.  A  manuscript: 

“Martin  MD,  Hilsenbeck  S,  Mohsin  SK,  Hopp,  TA,  Clark  GM,  Osborne  CK  ,  Allred  DC,  O’Connell 
P.  Breast  tumours  overexpressing  nuclear  isoforms  of  metastasis-associated  1  (MTA1)  protein 
have  high  recurrence  risks  and  enhanced  response  to  systemic  therapies”  was  submitted  to 
Lancet,  but  returned  without  review  based  on  editorial  priorities.  It  will  be  reformatted  and 
submitted  to  another  journal. 

6.  A  manuscript  describing  novel  isoforms  of  MTA1,  and  differential  gene  expression  in  AS- 
MTA1  -treated  versus  control  beast  cancer  cell  lines  is  in  preparation 

7.  Work  supported  in  part  by  this  funding  will  be  reported  in  a  manuscript  characterizing  a 
trangenic  mouse  overexpressing  the  80  kilodalton  MTA1  steroid  hormone  co-repressor 
isoform  as  well  as  echinoderm  green  fluorescent  protein  in  mammary  gland  (MMTV-promoter 
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driven)  is  planned  once  a  sufficient  number  of  animals  have  been  characterized. 

Reportable  Outcomes 

1.  Chang  JC,  Wooten  EC,  Tsimelzon  A,  Hilsenbeck  SG,  Gutierrez  MC,  Elledge  R,  Mohsin  S, 
Osborne  CK,  Chamness  G,Allred  DC,  O’Connell  P.  Gene  expression  profiling  predicts 
therapeutic  response  to  docetaxel  (Taxotere™)  in  breastcancer  patients.  Lancet  362:364-369, 
2003  (reprint  attached  as  a  .pdf  file) 

2.  We  have  just  confirmed  a  transgenic  mouse  line  co-expressing  the  MTA1  nuclear  steroid 
hormone  co-reppressor  isoform  and  green  fluorescent  protein  (to  facilitate  detection  of 
micrometastases)  under  control  of  an  MMTV  protomer, 

3.  Data  collected  under  the  aegis  of  this  funding  led  to  the  award  of  a  V  Foundation  clinical 
translation  grant  to  the  PI  (O’Connell). 

Conclusions 

Concept  award  (DAMD1 7-03-1 -0648)  funding  of  “MTAI-regulated  gene  expression:  New 
markers  of  breast  cancer  metastasis”  permitted  my  laboratory  to  make  significant  progress 
investigating  MTA1  as  a  prognostic  marker  of  occult  systemic  disease  in  node-negative  patients, 
and  as  a  predictive  marker  of  tumor  chemoreponse.  Our  data  suggest  that  MTAi 
overexpressing  node-negative  primary  breast  tumors  are  at  high-risk  for  recurrence,  and  that 
patients  whose  locally  treated  primary  tumors  show  favorable  risk  marker  profiles  and  normal 
levels  of  MTAI  expression  might  be  spared  adjuvant  chemotherapy  Furthermore,  the 
association  between  MTAI  overexpression  and  enhanced  chemoresponse  has  potential 
implications  for  all  breast  cancer  patients,  and  warrants  additional  study.  I  greatly  appreciate  the 
U.S.  Army  Medical  Research  and  Materiel  Command  for  their  support  of  my  research. 
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repressor,  regulating  transcription  via  chromatin  remodeling.  M  TA1  mRNA  levels  are  - 
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elevated  in  metastatic  relative  to  non-metastatic  tumours.  MTA1  loss  of  heterozygosity 

-{  Deleted:  is  lower 

is  sigfnificantly  less  freguent  in  node:positjye  relative  to  node-negatjye  breast  tumours. 
suggesting  epigenetic  alterations  of  MTA1  affect  metastatic  potential  (1) 


ii 


Jmmunohistochemistrv  showed  that  MTA1  overexpressinq  tumours  have  recurrence  , 
risks  similar  to  node-positive  tumours.  Untreated  node-negative  tumours  that 
overexpressed,  MTA1  had  the  highest  relapse  risk  (HR . .  =  ..2.68,  p  =  0.0006).  , . 
Chemotherapy  eliminated  all  MTA1  associations  with  clinical  outcome,  suggesting 
MTA1  overexpression  predicts,  early  relapse,  but  is  .associated,  with  enhanced  _ ; 
chemoresponse. 
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( Re  port- JS6Q,  words ) 


The  prognostic  significance  of  lymph  node  metastases  Jias  long  been  knowry  to  xiichotomize  risk 
ofjocal  versus  systemic  breast  cancer,  as  have  steroid  hormone  receptors’  predictive  value  in  selection  of  \\ 
adjuvant  hormonal  therapy  versus  cytotoxic  chemotherapies.  While  extremely  useful,  the  .risk  factors  in 
current  use  have  limits.  Finding  a  breast  tumour  to  be  oestrogen  receptor-negative  cannot  infer  the 
patient’s  optimal  chemotherapy  regimen,  and  the  absence  of  lymph  node  metastases  cannot  stratify 
relapse  risks  for  node-negative  patients.  Due  to  improved  awareness  and  screening  programs,  women 
with  primary  breast  cancer  increasingly  present  with  node-negative  disease.  Since  the  biomarkers  in 
current  use  cannot' differentiate  risks  for  node-negative  patients,  most  opt  for  chemotherapy,  although 
relatively  few,  stand  tg  benefit.  Effective  prognostic  indicators  pf  rnjcrometastasis  could  stratify  recurrence  ^ 
risks  and  adjuvant  therapy  benefits,  s  paring  the  majority  of  these  women  from  the  toxicity  and  cost  of 
chemotherapy. 

MTA1  is  a  steroid  hormone  receptor  co-repressor  (2),  but  inferring  a  specific  role  for  MTA1  in 
metastasis  is  complicated  by  the  rapidly  growing  MTA-gene  family's  at  least  six  alternatively  spliced  forms 
encoded  at  three  separate  loci  (i.e.,  MTA1  at  14q;  MTA2.at.11a:  MTA3. at  2q).  Multiple  alignment  of^1TAr 
gene  family  open  reading  frames  identified  an  MTAI-specific  peptide  that  when  attached  to  a  hapten, 
generated  a  rabbit  anti-MTAI  polyclonal  antibody.  MTA1  undergoes  alternative  splicing  to  both  full  length 
MTA1  and  a  previously  described  “short"  cytoplasmic  isoform  (MTAIs)  that  replaces  full  length  MTAI’s.C- 
terminal  prc  homology  and  nuclear  localization  domains  with  a  distinct  C-terminal  ER-binding  (LRILL)  motif 
(2)^ MTAIs  interacts  with  ERa  in  cytoplasm  rather  than  the  nucleus  (2). 

We  studied  a  large  collection  gLarchived  primary  breast  cancers  with  an  average  of  8.8  years  of 
clinical  follow-up.  Only  15%  of  the  primary  breast  tumours  studied  showed  significant  cytoplasmic 

,  {  Deleted:  ry  ) 

immu nohistochemistajj  (]HC)  staining,  but  to  avoid  confusion  based  on  MTAIs'  Jikely  alternative  .function,  4 : ; ; ' . . 
only  nuclear  IHC  signals  were  scored  for  these  analyses.  MTA1  nuclear  IHC  signals  were  scored  on  a 
range  of  0-8  by  adding  a  five  point  proportional  score  for  percent  of  IHC  positive  cells  to  a  three  point  IHC 
staining  intensity  scale  (3).  To  define  MTA1  overexpression,  we  compared  MTA1  nuclear  IHC  scores 
measured  in  normal  versus  tumour  tissues.  Breast  tumour  specimens  tested  had  a  significantly  h igher 

IHC  score  (3.57  versus  5.07,  respectively,  for  normal  and  tumour  tissues:  p  <  0.0002).  As  IHC  scores 
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exceeding  5  occurred  in  less  than  5%  of  norma!  tissues,  we  defined  MTA1  overexpression  as  an  IHC 
score  equal  to  or  greater  than  6.  Correlation  analyses  found  no  association  between  MTA1  expression, 
positive  lymph  nodes,  or  tumour  size.  As  shown  in  Table  1,  multivariate  analysis  of  the  full  tumour  set 
revealed  that  MTA1  overexpression  was  significantly  associated  with  early  relapse  (HR  =  1.91  p  = 

0.0015).  To  avoid  bias  created  by  adjuvant  endocrine  and/or  cytotoxic  therapies,  node-negative  patients 

Deleted: 

were  separated  into  treated  (N=217)  and  untreated  j(N=  397)  subsets.  In  the  untreated  subset,  both 
univariate  and  multivariate  analysis  indicated  MTA1  overexpression  was ^  strong^ prognostic .  indicator  of 
early  disease  recurrence  (HR  =  2.68,  p  =  0.0006),  outperforming  both  tumour  size  (HR  =  1.41,  p  =  0.039), 
and  5-phase  fraction  (HR  =  1 .26,  p  =  0.072).  As  indicated  irf  Figure  1 ,  the  23%  (93/394)  of; untreated 
node-negative  patients  whose  tumours  overexpressed  MTA1  levels  had  significantly  increased  risk  of 
early  disease  fo=  0.0001  in  univariate  analysis  and  p  =  0.0006  in  multivariate  analysis).  For  the  7% 

(29/394)  patients  whose  tumours  expressed  the  highest  levels  of  MTA1  (IHC  score  7-8),  relapse  rates 

{  Deleted: 

exceeded  60%.  Table  1  indicates  that  despite  MTAI’s  nearly  2-fold  increase  in  recurrence  risk,  neither  . 
univariate  nor  multivariate  models  of  overall  survival  detected  any  association  between  MTA1 
overexpression  and  earlier  patient  death  (p  =  0.42).  Table  1  also  shows  that  the  treated  subset  of  node- 
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negative  patients  had  no  MTA1 -associated  increase  in  recurrence  risk  (p  =  0.61).  The  acute  risk  vet 


unaffected  survival  seen  in  locally  treated  node-negative  patients  suggests  that  upon  systemic  treatment. 


their  MTAI-overexpressinq  recurrent  disease  had  enhanced  treatment  responses.  T  These  findinas 

indicate  that  MTA1  overexpression  is  an  independent  prognostic  indicator  of  risk  of  early  relapse,  \ 

especially  in  untreated  lymph  node-negative  primary  breast  cancers.  MTA1  overexpression  fails  to  \  \ 

directly  associate  with  robust  indicators  of  recurrence  such  as  tumour  size  and  lymph  node  status, 

suggesting  that  MTA1 -facilitated  distant  spread  is  independent  of.  and  perhaps  distinct  from.  Ivmoh  node,-  ; 

associated  recurrence  risk.  As  a  result,  measurement  of  MTA1  by  IHC  gleaned  independent  information  1 
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tumour  gene  expression  profiles,  we  found  MTA1  mRNA  levels  were  elevated  2.9-fold  (p  =  0.0085)  in  the 
docetaxel-sensitive  primary  tumours  (4). 

in  summary,  our  data  suggest  that  measuring  MTA1  protein  expression  in  primary  breast  tumours 
identifies  a  high-risk  subset  of  node-negative  patients  who  need  aggressive  treatment,  and  a  larger  subset 
with  no  MTAI-assocated  recurrence  risks.  Furthermore,  the  strong  association  between  MTA1 
overexpression  and  enhanced  treatment  response  has  potential  implications  for  all  breast  cancer  patients, 
and  warrants  additional  study. 
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MECHANISMS  OF  DISEASE 


Mechanisms  of  disease 


Gene  expression  profiling  for  the  prediction  of  therapeutic 
response  to  docetaxel  in  patients  with  breast  cancer 


Jenny  C  Chang ;  Eric  C  Wooten,  Anna  Tsimelzon,  Susan  G  Hilsenbeck,  M  Carolina  Gutierrez,  Richard  Eiledge,  Syed  Mohsin, 
C  Kent  Osborne,  Gary  C  Chamness,  0  Craig  Allred,  and  Peter  O'Connell 


Summary 

Background  Systemic  chemotherapy  for  operable  breast 
cancer  substantially  decreases  the  risk  of  death.  Patients 
often  have  de  novo  resistance  or  incomplete  response  to 
docetaxel,  one  of  the  most  active  agents  in  this  disease.  We 
postulated  that  gene  expression  profiles  of  the  primary 
breast  cancer  can  predict  the  response  to  docetaxel. 

Methods  We  took  core  biopsy  samples  from  primary  breast 
tumours  in  24  patients  before  treatment  and  then  assessed 
tumour  response  to  neoadjuvant  docetaxel  (four  cycles, 
100  mg/m2  daily  for  3  weeks)  by  cDNA  analysis  of  NRA 
extracted  from  biopsy  samples  using  HgU95  GeneChip. 

Findings  From  the  core  biopsy  samples,  we  extracted 
sufficient  total  RNA  (3-6  pg)  for  cDNA  array  analysis  using 
HgU95-Av2  GeneChip.  Differential  patterns  of  expression  of 
92  genes  correlated  with  docetaxel  response  (p=0-001). 
Sensitive  tumours  had  higher  expression  of  genes  involved  in 
cell  cycle,  cytoskeleton,  adhesion,  protein  transport,  protein 
modification,  transcription,  and  stress  or  apoptosis;  whereas 
resistant  tumours  showed  increased  expression  of  some 
transcriptional  and  signal  transduction  genes.  In  leave-one- 
out  cross-validation  analysis,  ten  of  11  sensitive  tumours 
(90%  specificity)  and  11  of  13  resistant  tumours  (85% 
sensitivity)  were  correctly  classified,  with  an  accuracy  of  88%. 
This  92-gene  predictor  had  positive  and  negative  predictive 
values  of  92%  and  83%,  respectively.  Correlation  between 
RNA  expression  measured  by  the  arrays  and 
semiquantitative  RT-PCR  was  also  ascertained,  and  our 
results  were  validated  in  an  independent  set  of  six  patients. 

Interpretation  If  validated,  these  molecular  profiles  could 
allow  development  of  a  clinical  test  for  docetaxel  sensitivity, 
thus  reducing  unnecessary  treatment  for  women  with  breast 
cancer. 

Lancet  2003;  362:  280-87 
See  Commentary 
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Introduction 

Adjuvant  systemic  treatment  after  surgery  for  breast 
cancer  is  the  most  crucial  factor  in  reducing  mortality — 
both  chemotherapy  and  hormonal  treatment  reduce  the 
risk  .of  death  in  such  patients. However  although 
oestrogen-receptor  status  is  predictive  of  response  to 
hormonal  treatments,  there  are  no  clinically  useful 
predictive  markers  of  a  patient’s  response  to 
chemotherapy.  Therefore,  all  patients  who  are  eligible  for 
chemotherapy  receive  the  same  treatment,  even  though  de 
novo  drug  resistance  will  result  in  treatment  failures  in 
many.  Hie  taxanes,  docetaxel  and  paclitaxel,  are  a  new 
class  of  antimicrotubule  agent  that  are  more  effective  than 
older  drugs  such  as  anthracyclines,**7  although  results  of 
clinical  trials  with  taxanes  and  anthracy dines  in 
combination  show  that  only  a  small  subset  of  patients 
benefit  from  the  addition  of  taxanes.8,9  There  are  no 
methods  to  distinguish  between  patients  who  are  likely  to 
respond  to  taxanes  and  those  who  are  not.  In  view  of  the 
accepted  practice  of  giving  adjuvant  treatment  to  most 
patients,  even  if  the  average  expected  benefit  is  low,  the  a 
priori  selection  of  appropriate  patients  most  likely  to 
benefit  from  adjuvant  treatment  with  taxanes  would  be  a 
great  advance  in  the  clinical  management  of  breast 
cancer.8,9  A  major  impediment  to  study  of  predictors  of 
effectiveness  of  adjuvant  treatment  is  the  absence  of 
surrogate  markers  for  survival  and,  consequently,  large 
numbers  of  patients  and  long-term  follow-up  are  needed. 

We  aimed  to  identify  gene  expression  patterns  in 
primary  breast-cancer  specimens  that  might  predict 
response  to  taxanes.  Neoadjuvant  chemotherapy  (ie, 
treatment  before  primary  surgery)  allows  for  sampling  of 
the  primary  tumour  for  gene  expression  analysis,  and  for 
direct  assessment  of  response  to  chemotherapy  by 
monitoring  changes  in  tumour  size  during  the  first  few 
months  of  treatment.10,11  Clinical  response  of  the  tumour 
to  neoadjuvant  chemotherapy  is  a  valid  surrogate  marker 
of  survival:  patients  whose  tumours  regress  substantially 
after  neoadjuvant  chemotherapy  have  better  outcome 
than  those  with  modest  response  or  clinically  obvious 
disease  that  is  resistant  to  chemotherapy. 10,11  With  the 
advent  of  high-throughput  quantification  of  gene 
expression,  simultaneous  assessment  of  thousands  of 
genes  is  now  possible,  which  allows  identification  of 
expression  patterns  in  different  breast  cancers  that  might 
correlate  with,  and  thereby  predict,  excellent  clinical 
response  to  treatment.12'16  These  profiles  have  potential  to 
explain  the  genetic  heterogeneity  of  breast  cancer  and 
allow  treatment  strategies  to  be  planned  in  accordance 
with  their  probability  of  success  in  individual  patients. 
Hence,  neoadjuvant  chemotherapy  provides  an  ideal 
platform  from  which  to  discover  predictive  markers  of 
chemotherapy  response.  In  our  study,  we  took  core  needle 
biopsy  samples  of  the  primary  breast  cancer  for  gene 
expression  profiling  before  patients  received  neoadjuvant 
docetaxel.  We  aimed  first,  to  show  that  sufficient  RNA 
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glossary 

ANEUPLOIDY  '  -j:;’.-: 

Cells  containing  an  abnormal  complement  of  chromosomes. 

APOPTOSIS 

Programmed  cell  death.  A  genetic  mechanism  leading  to  induced  cell 
death' that  involves  activation  of  a  cascade  of  genes.  Apoptosis  arises  in 
normal  tissue  and  can  be  associated  with  particular  disease  states. 

RESiuBSTrrtrrioN  estimates 

Application  of  the  classifier  to  the.  samples  used  to  create  it  • 


could  be  obtained  from  core  biopsy  samples  to  assess 
gene  expression;  second,  to  identify  groups  of  genes  that 
could  be  used  to  distinguish  primary  breast  cancers  that 
are  responsive  or  resistant  to  docetaxel  chemotherapy; 
and  third,  to  identify  gene  pathways  that  could  be 
important  in  the  mechanism  of  resistance  to  docetaxel. 

Method* 

Patients 

From  September,  1999,  to  June,  2001,  patients  with 
locally  advanced  breast  cancer  (ie,  primary  cancers 
>4  cm,  or  clinically  evident  axillary  metastases)  were 
considered  for  a  phase  2  study  with  neoadjuvant 
docetaxel.  Inclusion  criteria  were  (1)  age  greater  than 
18  years  and  a  diagnosis  of  breast  cancer  confirmed  by 
analysis  of  a  core  needle  biopsy  sample,  (2) 
premenopausal  status  accompanied  by  appropriate 
contraception,  (3)  adequate  performance  status,  and  (4) 
adequate  liver  and  kidney  function  tests  (all  within 
1  *5  times  the  institution’s  upper  limit  of  normal). 
Patients  were  excluded  if  they  had  severe  underlying 
chronic  illness  or  disease,  or  were  taking  other 
chemotherapeutic  drugs  while  on  study. 

This  study  (protocol  H8448)  was  approved  by  the 
institutional  review  board  of  Baylor  College  of  Medicine, 
Houston,  TX,  USA.  Patients  gave  written  informed 
consent. 

Clinical  procedures 

We  recorded  clinical  staging  and  size  of  primary  tumour  at 
the  start  of  treatment,  at  every  cycle,  and  after  completion 
of  four  cycles  of  chemotherapy.  Tumour  size  (product  of 
the  two  largest  perpendicular  diameters)  measured  before 
and  after  four  cycles  of  neoadjuvant  chemotherapy  was 
used  to  calculate  the  percentage  of  residual  disease.  The 
median  residual  disease  was  then  calculated,  and  this 
degree  of  response  was  used  to  divide  the  cancers  into  two 
roughly  equal  groups — sensitive  and  resistant  tumours — 
before  we  did  gene  expression  analysis. 

Before  docetaxel  was  given,  we  did  core  biopsies  of  the 
primary  cancers.  To  obtain  sufficient  tissue,  we  did  about 
six  core  biopsies  from  every  patient  using  an  MCI 410 
MaxCore  biopsy  instrument  (Bard,  Covington,  GA, 
USA).  Samples  were  taken  after  patients  had  been  given 
local  anesthesia  with  the  same  entry  point,  but 
reorienting  the  needle.  Two  to  three  core  biopsy 
specimens  were  immediately  transferred  for  snap  freezing 
at  -80°C  for  cDNA  array  analysis.  The  remaining 
specimens  were  fixed  in  formalin  for  diagnostic  analysis 
and  possible  immunohistochemical  analysis. 

Four  cycles  of  docetaxel  were  given  at  100  mg/m*  every 
3  weeks,  and  we  assessed  clinical  response  after  the  fourth 
cycle,  at  12  weeks.  As  part  of  standard  care,  patients  were 
continued  on  neoadjuvant  chemotherapy  through  the  full 
four  cycles,  unless  there  was  clear  documentation  of 
progressive  disease,  which  we  defined  as  increase  in 
tumour  size  of  more  than  25%.  After  the  course  of 


neoadjuvant  docetaxel  was  complete,  primary  surgery  was 
done  and  standard  adjuvant  treatment  was  given. 

RNA  extraction  and  amplification 

We  isolated  total  RNA  from  the  frozen  core  biopsy 
specimens  in  accordance  with  protocols  recommended 
by  Asymetrix  (Santa  Clara,  CA,  USA)  for  GeneChip 
experiments.  Total  RNA  was  isolated  with  Trizol  reagent 
(Invitrogen  Corporation,  Carlsbad,  CA).  Samples  were 
subsequently  passed  over  a  Qiagen  RNeasy  column 
(Qiagen,  Valencia,  CA)  for  control  of  small  fragments 
that  affect  RT-reaction  and  hybridisation  quality  (ECW, 
unpublished  data).  Each  core  biopsy  yielded  3-6  jig  of 
total  RNA.  After  RNA  recovery,  double-stranded  cDNA 
was  then  synthesised  by  a  chimeric  oligonucleotide  with 
an  oligo-dT  and  a  T7  RNA  polymerase  promoter  at  a 
concentration  of  100  pmoI/yL. 

We  did  reverse  transcription  in  accordance  with 
protocols  recommended  by  Affymetrix  using 
commercially  available  buffers  and  proteins  (Invitrogen 
Corporation).  Biotin  labelling  and  about  250-fold  linear 
amplification  followed  phenol-chloroform  clean  up  of  the 
reverse-transcription  reaction  product  and  was  done  by 
in-vitro  transcription  (Enzo  Biochem,  New  York,  NY, 
USA)  over  a  reaction  time  of  8  h.  From  each  biopsy 
specimen,  we  hybridised  15  jig  of  labelled  cRNA  onto 
the  U95Av2  GeneChip  using  recommended  procedures 
for  prehybridisation,  hybridisation,  washing,  and  staining 
with  streptavidin-phycoerythrin  (SA-PE).  Antibody 
amplification  was  done  with  a  biotin-linked  antibody  to 
streptavidin  (Vector  Laboratories,  Burlingame,  CA)  with 
a  goat-IgG  blocking  antibody  (Sigma,  St  Louis,  MO, 
USA).  A  second  application  of  the  SA-PE  dye  was  used 
after  additional  wash  steps  had  been  done.  After 
automated  staining  and  wash  protocols  (Affymetrix 
protocol  EukGE-2v4),  the  arrays  were  scanned  by  the 
Affymetrix  GeneChip  scanner  (Agilent,  Palo  Alto,  CA) 
and  quantitated  with  Micoarray  suite  version  5.0 
(Affymetrix).  The  U95Av2  GeneChip  consists  of  about 
12  625  probe  sets,  each  containing  about  16  perfect 
match  and  corresponding  mismatch  25mer 
oligonucleotide  probes  representing  sequences  (genes), 
most  of  which  have  been  characterised  in  terms  of 
function  or  disease  association.  The  raw,  un-normalised 
probe  level  data  were  then  analysed  by  dChip 
(http://dchip.org)  for  final  normalisation  and  modelling. 
Median  intensity  was  used  for  the  normalisation  of  the 
24  arrays  and  the  perfect  match/mismatch  (PM/MM) 
modelling  algorithm  was  used. 

Semiquantitative  RT-PCR 

We  did  semi-quantitative  RT-PCR  (sqRT-PCR) 
measurement  of  gene  expression  levels  using  the  same 
amplified  cRNA  hybridised  to  the  GeneChip.  20  genes 
were  selected  for  analysis  on  the  basis  of  their  high  variation 
in  expression.  Primers  were  designed  for  these  loci  with  the 
sequences  freely  available  from  the  Entrez  Nucleotide 
database17  and  the  Primer3  algorithm  for  primer  design. 
Product  sizes  were  kept  short  (<150  bp)  to  allow  the 
maximum  ability  to  work  under  varying  conditions  relative 
to  cRNA  quality.  Primers  were  optimised  with  a  reverse- 
transcribed  mixture  of  six  samples.  15  duplicate  reactions 
were  prepared  and  samples  were  taken  at  alternating  cycle 
numbers  between  15  and  33  to  ensure  that  the  sqRT-PCR 
reaction  products  were  in  a  linear  range  of  accumulation. 
These  samples  were  then  arranged  in  ascending  order, 
diluted  with  10  |iL  loading  buffer,  and  3  jjL  of  each  sample 
was  loaded  onto  6%  denaturing  t  acrylamide  gels. 
Electrophoresis  at  60  W  was  done  for  2  li,  or  until  sufficient 
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Figure  1:  Two  methods  of  statistical  analysis 

A:  the  prognostic  analysis  used  by  van't  Veer  and  colleagues1*  used  oligonucleotide  microarrays  with  25  000  genes,  from  which  5000  variably  expressed 
genes  were  selected  by  filtering.  Of  these,  231  genes  were  significantly  associated  with  prognostic  outcome  (lrl>0-3).  These  231  genes  were  then  rank- 
ordered  on  the  basis  of  the  magnitude  of  the  correlation  coefficient  and  selected  in  groups  of  five  to  construct  the  smallest  optimum  classifier.  Leave-one- 
out  analysis  was  then  done  with  231  genes  that  were  correlated  with  outcome  to  select  a  classification  set  of  70  genes.  B:  statistical  analysis  methods 
used  in  this  study:  a  subset  of  1628  genes  was  selected  by  filtering  on  signal  intensity  to  eliminate  genes  with  uniformly  low  expression  or  genes  whose 
expression  did  not  vary  significantly  across  the  samples. 


separation  of  the  xylene  cyanol  and  bromophenol  blue  dyes 
was  achieved.  Gels  were  then  fixed,  removed  from  the  rear 
plate,  transferred  to  filter  paper,  and  dried.  We  first 
assessed  these  dry  gels  using  autoradiography  (about  8  h 
exposure,  no  intensification),  and  analysable  gels  were  then 
exposed  to  phosphorimaging  screens.  Primers  that  failed  to 
produce  a  single  clear  band  were  attempted  again  with 
different  annealing  temperatures  until  a  single  band  was 
produced. 

15  of  the  20  primers  chosen  proved  suitable  to  use  and 
gave  clean,  single  bands  for  analysis.  The  remaining  five 
failed  to  optimise  properly  and  were  not  included  in  any 
further  analysis.  Although  high-cycle  samples  inevitably 
achieved  pixel-saturation,  care  was  taken  to  keep  exposure 
times  to  a  minimum,  so  as  to  keep  intensity  within  the 
informative  range  on  most  cycle-totals  within  each  set.  To 
determine  the  linear  range  of  the  15  primers,  we  analysed 
their  absolute  intensities  using  Microsoft  Excel  graphing 
functions.  We  then  did  phosphorimager  quantification 
analysis  (Bio-Rad  Laboratories,  Hercules,  CA),  and  RT- 
PCR  product  band  intensities  were  quantitatively 
compared  with  normalised,  model-based  estimates  of 
expression  from  the  GeneChip  data. 

Statistical  analysis 

The  analytical  approach  used  in  this  study  (figure  1)  was 
similar  to  the  successful  methods  described  previously. '* 
After  scanning  and  low-level  quantification  using 
Microarray  Suite  (Affymetrix),  we  used  DNA-Chip 
analyser  dchip  version  1.2  to  adjust  arrays  to  a  common 
baseline19  and  estimated  expression  using  Li  and 
colleagues’  PM-MM  model.20,21  We  eliminated  genes  that 
were  not  present  in  at  least  30%  of  samples,  and  exported 
expression  data  for  the  remaining  6849  genes  to  BRB 
Arraytools  version  2.1c22  for  more  filtering  and  analysis. 
After  transforming  all  data  by  taking  logarithms,  we 
ranked  genes  by  variability  over  all  24  samples,  and  we 
retained  the  1628  genes  that  were  significantly  more 
variable  than  the  median  variance. 


We  selected  differentially  expressed  genes  from  the 
filtered  gene  list  using  the  two-sample  t  test,  and  then 
used  a  global  permutation  test  as  an  overall,  multiple 
comparison-free  test  of  whether  the  number  of 
differentially  expressed  genes  exceeded  that  which  might 
arise  by  chance.  In  this  test,  the  observed  number  of 
significantly  differentially  expressed  genes  was  compared 
with  the  distribution  of  numbers  of  differentially 
expressed  genes  generated  by  repeatedly  permutating  the 
labels  of  the  samples  and  recalculating  the  t  test  at  the 
specified  level  of  significance. 

Next,  we  developed  a  classifier  to  predict  response. 
With  a  list  of  discriminatory  genes  and  their  associated 
r  values,  we  used  the  compound  covariate  predictor 
method  of  Radmacher  and  colleagues.23  to  construct 
a  linear  classifier,  resubstttution  estimates  of 
classification  success,  in  which  the  classifier  is  applied  to 
the  same  samples  used  to  create  it,  are  invariably  biased 
(ie,  they  are  overly  optimistic).24,25  Therefore,  we  used  an 
external  cross-validation  procedure  to  generate  a  less 
biased  estimate  of  classification  success.  Starting  with 
1628  genes  that  had  significant  variation  in  expression, 
and  which  were  filtered  without  any  respect  to  class 
membership,  the  entire  gene  selection  and  classifier 
construction  process  was  repeated  in  a  leave-one-out 
cross-validation  to  estimate  classifier  performance. 
Finally,  to  assess  whether  the  degree  of  successful 
classification  we  noted  could  have  arisen  by  chance, 
the  entire  cross-validation  procedure  was  repeated 
2000  times,  permutating  the  sample  labels  every  time. 
The  observed  cross-validated  classification  success  rate 
was  then  compared  with  the  distribution  of  classification 
success  in  the  permutation  analysis.  Cross-validated 
performance  was  summarised  by  observed  sensitivity 
and  specificity,  and  associated  exact  binomial  confidence 
intervals.  Resubstitution  classifier  values  were  also 
used  to  generate  a  receiver  operating  characteristic 
curve  (ROC  curve)  and  to  estimate  the  area  under 
the  curve. 
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Age  (years) 

Menopausal  status 

Ethnic  origin  Bldimenslonal 

tumour  size  (cm) 

Clinical  axillary 
nodes 

Oestrogen- 
receptor  status 

Progesterone- 
receptor  status 

HER-2 

Tumour  type 

Patient 

1  37 

Premenopausal 

Hispanic 

10X10 

No 

IMC 

2 

55 

Postmenopausal 

Hispanic 

10x8 

Yes 

- 

- 

+ 

IDC 

3 

41 

Premenopausal 

Black 

6X5 

Yes 

+ 

+ 

- 

IDO 

4 

43 

Premenopausal 

Black 

15X13 

Yes 

+ 

- 

- 

IMC 

5 

50 

Postmenopausal 

Black 

20x23 

Yes 

- 

- 

- 

IDC 

6 

55 

Postmenopausal 

Black 

11X11 

Yes 

+ 

+ 

- 

IDC 

7 

42 

Premenopausal 

Black 

7x9 

Yes 

+ 

+ 

- 

IMC 

8 

63 

Postmenopausal 

Black 

7X8 

Yes 

+ 

+ 

- 

IMC 

9 

50 

Postmenopausal 

Black 

13x9 

No 

+ 

+ 

- 

IDC 

10 

38 

Premenopausal 

Hispanic 

8X8 

Yes 

+ 

+ 

- 

IMC 

11 

58 

Postmenopausal 

Hispanic 

7X7 

Yes 

+ 

+ 

- 

IMC 

12 

62 

Postmenopausal 

Hispanic 

4X4 

Yes 

•+ 

- 

- 

IDC 

13 

40  •  - 

Premenopausal 

Hispanic 

55x4-5 

No 

+ 

- 

IMC 

14 

36 

Premenopausal 

Black 

6x6 

Yes 

+ 

+ 

- 

IDC 

15 

56 

Postmenopausal 

Black 

5x5-5 

No 

+ 

- 

IMC 

16 

38 

Premenopausal 

White 

6x6 

Yes 

+ 

- 

- 

IDC 

17 

54 

Postmenopausal 

White 

5x6 

Yes 

+ 

+ 

+ 

IDC 

18 

52 

Postmenopausal 

White 

10x10 

No 

+ 

+ 

- 

IDC 

19 

57 

Postmenopausal 

White 

8x8 

No 

- 

- 

- 

IDC 

20 

52 

Postmenopausal 

Black 

10X10 

No 

- 

- 

- 

IDC 

21 

44 

Premenopausal 

Black 

11X11 

No 

- 

- 

- 

IDC 

22 

41 

Premenopausal 

Black 

6X5 

Yes 

+ 

+ 

- 

IDC 

23 

38 

Premenopausal 

White 

8x8 

Yes 

+ 

+ 

- 

IDC 

24 

54 

Postmenopausal 

Black 

9x7 

No 

+ 

+ 

- 

IDC 

HER-2=H ER-2/neu  oncogene  detected  by  immunohistochemical  analysis.  =  negative.  +=positive.  IMC=invasive  mammary  carcinoma.  lDC=invasive  ductal  carcinoma. 


Table  1;  Characteristics  of  patients  in  the  training  set 


The  classifier  was  partly  validated  with  an  independent 
set  of  six  patients  treated  in  the  same  clinical  trial  as  those 
in  the  training  set.  RNA  was  obtained  from  pretreatment 
biopsy  samples  and  hybridied  to  HgU95av2  GeneChips 
exactly  as  described  for  the  training  sample.  Probe  level 
data,  were  adjusted  to  the  same  baseline  array  as  the 
training  set,  and  gene  expression  values  were  calculated 
with  previously  estimated  probe  sensitivity  values  derived 
from  the  training  sample.  The  92-gene  classifier  was  then 
applied  to  predict  response  in  every  new  sample. 

Role  of  the  funding  source 

The  study  sponsors  did  not  contribute  to  the  study 
design,  collection,  analysis,  or  interpretation  of  data.  The 
manuscript  was  reviewed  with  only  minor  editorial 
changes  by  one  of  the  study’s  sponsors,  Aventis 
Pharmaceutical. 

Results 

Assessment  of  clinical  response 

We  included  24  patients,  and  their  clinical  characteristics 
are  shown  in  table  1 .  Unidimensional  median  tumour  size 
before  treatment  was  8  cm  (range  4-30  cm).  Before  doing 
gene  expression  analysis,  we  defined  tumour  sensitivity' 
and  resistance  on  the  basis  of  the  percentage  of  residual 
disease  after  treatment.  We  first  determined  that  the 
median  residual  disease  after  chemotherapy  was  30%.  We 
then  arbitrarily  defined  sensitive  tumours  as  those  that 
had  25%  or  less  residual  disease,  and  resistant  tumours  as 
those  with  more  than  25%  residual  disease,  since  this 
cutoff  divides  the  patients  into  two  almost  equally  sized 
groups  for  statistical  comparison.  In  this  study  of  locally 
advanced  breast  cancer,  tumours  were  large  and  a 
regression  of  at  least  75%  after  chemotherapy  would 
almost  certainly  represent  a  clinically  important  response. 
Of  these  24  patients,  11  (46%)  were  sensitive  to  docetaxel 
and  13  (54%)  were  resistant.  Of  the  sensitive  tumours, 
five  patients  (45%)  had  minimal  residual  disease  (<10% 
residual  tumour),  whereas  of  the  resistant  tumours,  seven 
(58%)  had  residual  tumour  mass  of  60%  or  greater,  and 
three  (23%)  of  these  residual  tumours  were  100%  or 
greater  of  baseline. 


Selection  of  discriminatory  genes 

To  select  discriminatory  genes,  we  compared  expression 
data  in  the  sensitive  and  the  resistant  tumours  (figure  2). 
First,  we  selected  a  subset  of  candidate  genes  by  filtering 
on  signal  intensity  to  eliminate  genes  with  uniformly  low 
expression  or  genes  whose  expression  did  not  vary 
significantly  across  the  samples,  retaining  1628  genes. 
After  log  transformation,  a  t  test  was  used  to  select 
discriminatory  genes,  i  tests  with  nominal  p  values  of 
0*001,  0*01,  and  0*05  selected  92,  300,  and  551  genes, 
respectively,  for  which  expression  differed  in  sensitive  and 
resistant  groups — ie,  differentially  expressed.  The 
probability  that  these  numbers  of  genes  would  be  selected 
by  chance  alone  was  estimated  to  be  0*0015,  0*001,  and 
less  than  0*001  respectively  (table  2).  These  results  can  be 
reviewed  with  data  at  the  gene  expression  omnibus.26 

Functional  classification  of  discriminatory  genes 

The  92  genes  classed  as  most  significantly  “differentially 
expressed”  at  p=0-001  are  listed  in  the  webtable 
(http://image.thelancet.eom/extras/0 1  an  1 1 086webtable.pdf) 
(figure  2).  These  genes  showed  4*2-2*6-fold  decreases  or 
2*5-1 5*7-fold  increases  in  expression  in  resistant 
compared  with  sensitive  tumours.  Functional  classes  of 
these  differentially  expressed  genes  included  stress  or 
apoptosis  (21%),  cell  adhesion  or  cytoskeleton  (16%), 
protein  transport  (13%),  signal  transduction  (12%),  RNA 
transcription  (10%),  RNA  splicing  or  transport  (9%),  cell 
cycle  (7%),  and  protein  translation  (3%);  the  remainder 
(9%)  have  unknown  functions.  14  of  these  92  genes  were 
overexpressed  in  the  treatment-resistant  cluster  with 
major  categories  including  unknown  function,  protein 


p  value  for  gene  selection 

0-001 

0-01 

0-05 

Number  of  differentially 

92 

300 

551 

expressed  genes 
Permutation  p* 

0  0015 

0-001 

0-001 

•The  proportion  of  permutations  in  which  the  number  of  genes  selected 
exceeds  the  observed  number  of  genes. 

Table  2:  Group  comparison  analysis,  with  different  nominal 
p-values  (0-001,  0*01,  0*05) 
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1624_at.  RAP1GDS1 
36626_at,  HSD17B4 
35807_at,  CYBA 
38765_at  DICER1 
38211_at  2NF288 
33133_at  FUI 
607_s_at  VWF 
3143  l_at,  FCGRT 
39182_at,  EMP3 
32331_at  AK3 
40064_at,  ALS2CR3 
34305_at  PCBP1 
35733_at,  ACTR2 
387S4_g_at  MUC1 
40619_at,  E2-EPF 
35S44_at  SDC4 
40060_r_at,  UM 
3837 2_at.  Unnamed 
39185_at,  L0C56270: 
40096_at  ATP5A1 
40463_at  KPMB2 
41198_at  GRN 
4162 7_at  SDF2 
37361_at,  FIBP 
38618_at,  UMK2 
39076_s_at,  DRAP1 
3337 l_s  at.  RAB31 
1641_s_at,  D0B1 
36811_at,  LOXU 
33931_at,  GPX4 
38791_at  DDOST 
388S0_at  Unnamed 
1635_at,  ABL1 
31638_at  GAMT 
39347_at,  AP2S1 
32523_at,  CLTB 
36846_s_at  LSM7 
39030_at,  RABAC1 
691_g_at  P4HB 
37674„at  AtASl 
38613_at,  CGI  I 
33781_s_at.  UBE2M 
32843_s_at  CSNK2B 
33214jtt.  MRPS12 
922  at,  PPP2R1A 
36125_s_at.  RALY 
39180_at,  FUS 
1751_g_at.  CALR 
38831_f„at.  EPO 
41528_at  ESI 
40514_at,  SDBCAG84 
33393_at  00X19 
362C8_at  BR02 
646_s_at  CLK2 
1199_at,  EIF4A1 
40465_at,  U5-100K 
39724_s_at  CUL1 
41757_at  VAPB 
35626_at  SGSH 
39561_at  0NAL4 
38686_at  ATP6V0D1 
41551_at  RER1 
3484 5_at  CGI-51 
41858_at  FRAG1 
35695_at  CHS1 
2085_s_at  CTNNA1 
37313_at  GTF2H2 
3899S_g_at  SLC25A1 
39812_at  MRPL12 
41413_at  CIPTM1 
1997_s_at,  BAX 
41308_at  CTBP1 
34163_i_at  RBPMS 
39018^at  MGST3 
34816_at  EP400 
41672_at  Unnamed 
33285 J.at  FU21168 
543  g  at.  CRABP1 
4053SJ_at  tF2 
41338_at  EST 
3699 l_at  SFRS4 
32099_at  K1AA0138 
39638_at,  TFAP4 
1008_f_at  PRKR 
40888_f_at.  EEF1A1 
1250_at,  PRKDC 
36898_r_at  PRIM2A 
40118_at,  2NF38 
38259_at.  STXBP2 
38942_r_at.  AD024 
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translation,  cell  cycle,  and  RNA  transcription.  Tubulin 
isoforms  were  associated  with  docetaxel  resistance. 

Of  the  78  genes  overexpressed  in  docetaxel-sensitive 
tumours,  major  categories  were  stress  or  apoptosis, 
adhesion  or  cytoskeleton  (no  genes  with  this  function 
were  overexpressed  in  resistant  tumours),  protein 
transport,  signal  transduction,  and  RNA  splicing  or 
transport.  In  sensitive  tumours,  genes  involved  in 
apoptosis  (eg,  overexpression  of  BAX,  UBE2M , 
VBCH10 ,  CUL1 ),  and  DNA  damage-related  gene 
expression  (eg,  overexpression  of  CSNK2B,  DDB1,  and 
ABL1 ,  and  underexpression  of  PRKDC)  seem  to 
contribute  to  docetaxel  sensitivity. 

Leave-one-out  cross-validation 

In  this  cross-validation  analysis,  we  began  with  all  1628 
filtered  genes  to  avoid  selection  bias.24,25  Every  observation 
in  turn  was  left  out  and  the  remaining  samples  were  used 
to  select  differentially  expressed  genes;  we  then 
constructed  a  compound  covariate  predictor  to  classify 
the  left-out  sample.  Ten  of  11  sensitive  tumours  (91% 
specificity,  [95%CI  0*59-1  *00])  and  11  of  13  resistant 
tumours  (85%  sensitivity  [0*55-0*98])  were  correctly 
classified,  for  an  overall  accuracy  of  88%  (68-97%). 
Results  of  permutation  testing  showed  that  such  a  high 
cross-validated  classification  accuracy  is  significant 
(p=0*008).  The  analogous  predictor,  constructed  with  92 
genes  selected  with  use  of  all  24  samples,  yielded  identical 
classification  success.  With  this  predictor,  positive  and 
negative  predictive  values  for  response  to  docetaxel  were 
92%  and  83%,  respectively,  and  the  area  under  the 
ordinary  receiver  operating  characteristic  (ROC)  curve 
was  0*96  (figure  3). 

Confirmation  of  expression  measurements 

To  confirm  measurement  of  RNA  concentrations, 
expression  values  derived  from  adjusted  Affymetrix  data 
were  correlated  with  values  from  sqRT-PCR  for  15 
variably  expressed  genes  (table  3).  Spearman  rank 
correlations  were  positive  for  13  genes  and  significantly 
positive  for  six  of  15  genes. 

Validation  in  an  independent  cohort 

The  six  additional  patients  enrolled  in  this  prospective 
clinical  study  were  studied  to  partly  validate  the  92-gene 


Figure  3:  Receiver  operating  characteristic  (ROC)  curve  for 
predicting  response  to  docetaxel 


Affymetrix 
probe  set 

Number 

Pearson 

correlation 

Spearman  rank 
correlation 

r 

P 

r . 

P _ 

ACTB 

32318_$_at 

5 

0-81 

0*09 

090 

004 

ATP6V0E 

33875_at 

5 

0-28 

0-65 

010 

0*87 

BMt-1 

1728_at 

8 

0-90 

0002 

021 

061 

CALM3 

1158_s_at 

7 

0-52 

0*23 

0*64 

012 

FUCA1 

41814_at 

6 

0-77 

007 

0*94 

0*00 

GLRX 

34311_at 

8 

0-74 

003 

050 

0*21 

IFITM1 

676  g  at 

5 

0-74 

015 

0*70 

0*19 

LAMR1 

256_s_at 

8 

069 

0*06 

085 

0*01 

LMNA 

37378_r_at 

5 

-008 

0*90 

-0*40 

050 

MUC1 

38783_at 

8 

084 

0*01 

0-71 

005 

MY010 

35362_at 

8 

015 

072 

0*05 

0-91 

PLOD 

36184_at 

4 

-0-41 

0*59 

-0*80 

0-20 

PSMD5 

32240_at 

8 

0-27 

052 

0*33 

0*42 

SERPINB5  863_g_at 

8 

075 

0*03 

0*81 

0-01 

SPARCL1 

36627_at 

6 

092 

0*01 

1*00 

0-00 

Table  3:  Correlation  of  Affymetrix  expression  data  with 
sqRT-PCR  derived  values. 

Correlations  positive  for  13  genes  and  significantly  positive  for 
6  of  15  genes 

predictive  classifier.  In  this  small  set,  all  six  patients  had 
sensitive  tumours  and  were  correctly  classified  by  our 
predictive  method. 

Discussion 

We  obtained  sufficient  RNA  from  small  core  biopsy 
samples  of  human  breast  cancers,  to  assess  patterns  of 
gene  expression  in  individual  tumours  and  identified 
molecular  profiles  using  gene  expression  patterns  of 
human  primary  breast  cancers  to  accurately  predict 
sensitivity  to  docetaxel  in  women  with  primary  breast 
cancer. 

Gene  expression  patterns  associated  with  docetaxel 
sensitivity  and  resistance  are  highly  complex.  In  the  past, 
investigators  using  single  gene  biomarkers  to  assess 
sensitivity  and  resistance  to  chemotherapy  have  seldom 
produced  conclusive  results.  For  example,  in  a  breast 
cancer  study  the  researchers  did  not  note  any  correlation 
between  commonly  measured  predictive  and  prognostic 
markers  (HER-2,  p53,  p27,  or  epidermal  growth  factor 
receptor)  and  taxane  sensitivity.27  Reports  of  different 
cancer  types  have  suggested  that  alterations  in  expression 
levels  of  (3  tubulin  isoforms  might  represent  an  important 
and  complex  mechanism  of  taxane  resistance.28  We  noted 
that  overexpression  of  some  (3  tubulin  isoforms  was 
associated  with  docetaxel  resistance  in  some  tumours,  but 
not  all.  These  results  suggest  that  the  patterns  of  gene 
expression  for  sensitivity  and  resistance  are  likely  to 
involve  multiple  gene  pathways,  and  that  integration  of 
many  genes  in  these  pathways  leads  to  drug  sensitivity  and 
resistance.  Our  results  lend  support  to  the  idea  that 
assessment  of  expression  of  a  few  individual  genes  will  not 
be  powerful  enough  to  untangle  the  heterogeneity  of 
clinical  breast  cancers,  but  that  patterns  of  expression  of 
many  genes  could  be  successful  in  distinguishing  between 
sensitive  and  resistant  tumours. 

A  key  point  of  this  study  was  to  focus  on  genes  that 
could  be  reliably  measured  and  to  exclude  those  that  were 
unlikely  to  be  expressed  in  any  sample.  We  did  not  design 
this  study  to  discover  specific  genes  for  docetaxel  response 
or  resistance,  but  rather  to  identify  patterns  of  many  genes 
that  could  be  used  as  a  predictive  test  in  patients  with 
breast  cancer.  As  a  result,  our  analysis  will  have  excluded 
some  differential  genes  with  low  expression,  some  of 
which  might  be  biologically  interesting.  For  example,  that 
spindle  checkpoint  dysfunction  is  an  important  cause  of 
aneuploidy  in  human  cancers  has  been  suggested.  The 
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serine-threonine  kinase  gene  STK6  (AURORA  A)2* 
might  constitute  a  mechanism  of  spindle  checkpoint 
deregulation,  and  its  amplification  has  been  shown  to 
predict  resistance  to  taxanes.”  Indeed,  we  did  note 
differential  expression  between  sensitive  and  resistant 
tumours — overexpression  of  STK6  was  about  1 -4-fold 
higher  in  docetaxel-resistant  tumours  than  in  those  that 
were  sensitive  to  the  drug  (mean  expression  506  and  695 
in  sensitive  and  resistant  tumours,  respectively;  p=0-046). 
Nevertheless,  this  gene  was  not  part  of  the  92-gene 
classifying  list  because  of  its  overall  low  expression.  This 
classifying  list  does  not  include  all  genes  relevant  to 
docetaxel  sensitivity  and  resistance,  but  rather,  identifies 
patterns  .of  many  genes  that  could  be  used  as  a  predictive 
clinical  test. 

There  is  little  information  about  the  usefulness  of  gene 
expression  arrays  in  human  breast  cancers.18-30"32 
Van*t  Veer  and  colleagues,18  using  printed 
oliogonucleotide  microarrays,  noted  that  gene  expression 
profiles  were  more  accurate  predictors  of  outcome  in  a 
small  set  of  78  young  women  with  node-negative  breast 
cancer  than  standard  clinical  and  histological  criteria.  The 
same  investigators  subsequently  validated  this  70-gene 
classifier  in  a  cohort  of  295  patients,  many  of  whom  were 
not  in  the  original  study.31  The  signature  of  poor  prognosis 
included  genes  regulating  cell  cycle,  invasion,  metastasis, 
and  angiogenesis.  Perou  and  colleagues32  and  Sorlie  and 
colleagues31  used  cDNA  arrays  and  identified  distinct 
patterns  of  gene  expression  that  were  termed  basal  or 
luminal.  These  groups  differed  from  each  other  with 
respect  to  clinical  outcome.18-31  Unlike  these  earlier 
publications  that  dealt  with  patient  prognosis,  our  aim  was 
to  identify  gene  expression  patterns  that  could  predict 
response  or  resistance  to  docetaxel  in  patients  with 
primary  breast  cancer. 

Although  breast  cancers  are  highly  heterogeneous,  the 
classifying  gene  list  gives  some  clues  to  the  mechanisms  of 
sensitivity  and  resistance  in  some  tumours.  In  general, 
resistant  tumours  overexpressed  genes  associated  with 
protein  translation,  cell  cycle,  and  RNA  transcription 
functions,  whereas  sensitive  tumours  overexpressed  genes 
involved  in  stressor  apoptosis,  cytoskeleton,  adhesion, 
protein  transport,  signal  transduction,  and  RNA  splicing 
or  transport.  Consistent  with  an  apoptosis-induction 
mode  of  action  for  taxanes,  sensitive  rumours  had  higher 
expression  of  apoptosis-related  proteins  (eg,  BAX, 
UBE2M,  UBCH10,  CUL1).  DNA  damage-related  gene 
expression  in  docetaxel-sensitive  tumours  (overexpression 
of  CSNK2B,  DDB1,  ABL,  and  underexpression  of 
PRKDC)  also  seems  to  contribute  to  docetaxel  sensitivity. 

Furthermore,  in  sensitive  tumours,  overexpression  of 
genes  implicated  in  stress-related  pathways  was  also 
noted,  especially  heat  shock  proteins.  Overexpression  of 
heat  shock  protein  27  (HSP27)  has  been  associated  with 
doxorubicin  resistance  in  the  MDA-MB-23 1  breast 
cancer  cell  line.33  By  contrast,  the  same  investigators 
have  shown  that  HSP27-overexpressing  cell  lines 
remain  sensitive  to  docetaxel  (Fuqua  S,  personal 
communication),  suggesting  that  different  non-cross- 
resistant  agents  could  have  different  gene  patterns  of 
sensitivity  and  resistance.  If  true,  then  specific  patterns  of 
gene  expression  could  be  used  as  tools  to  choose  between 
doxorubicin  and  docetaxel 

In  a  leave-one-out  cross-validation  procedure,  the 
classifier  that  included  genes  selected  at  the  nominal  value 
of  p^O-OOl  correctly  classified  tumours  as  sensitive  or 
resistant  in  nearly  90%  of  cancers.  Additionally,  the 
predictive  value  of  this  classifier  compares  very  favourably 
with  that  of  oestrogen-receptor  status,  which  is  the  only 


validated  factor  that  can  predict  response  to  hormone 
treatment  in  breast  cancer.  Oestrogen-receptor  has  a 
positive  predictive  value  for  response  to  hormone  therapy 
of  about  60%,  and  a  negative  predictive  value  of  about 
90%. 34  If  about  70%  of  breast  cancers  are  oestrogen- 
receptor  positive,  then  sensitivity  and  specificity  for 
hormone  responsive  and  non-responsive  tumours  are 
about  93%  and  50%,  respectively,  and  the  area  under  the 
ROC  curve  for  oestrogen  receptor  is  only  about  0*72. 
The  docetaxel  classifier  has  positive  and  negative 
predictive  values  of  92%  and  83%,  respectively,  and  the 
area  under  the  ROC  curve  of  0-96  (figure  3).  Although 
these  predictive  values  are  likely  to  be  slightly  biased  and 
have  wide  confidence  intervals,  these  results  suggest  that 
classifiers  based  on  gene  expression  would  probably 
compare  favourably  with  other  clinically  validated 
predictive  markers. 

Differences  in  RNA  expression  were  confirmed  by 
sqRT-PCR  for  a  sample  of  genes.  Furthermore,  we  have 
validated  our  classifier  in  an  independent  set  of  six 
consecutively  treated  patients,  all  of  whom  responded  to 
treatment.  Although  the  validation  set  is  very  small,  it 
does  lend  support  to  the  suggestion  that  gene  expression 
arrays  could  be  used  to  predict  effectiveness  of  treatment. 

This  study  shows  that  expression  array  technology  can 
effectively  and  reproducibly  classify  tumours  according 
to  response  or  resistance  to  docetaxel  chemotherapy.  To 
ultimately  define  the  molecular  portrait  of  cancers 
sensitive  or  resistant  to  docetaxel,  our  results  should  be 
validated  in  a  study  with  a  large  independent  cohort  of 
patients.  Further  patient  recruitment  and  analysis  will 
refine  the  gene  list  by  which  to  classify  tumours.  This 
type  of  molecular  profiling  could  have  important  clinical 
implications  in  defining  the  optimum  treatment  for  an 
individual  patient,  thus  reducing  the  use  of  unproductive 
treatments,  unnecessary  toxicity,  and  overall  cost. 
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